想象一个图书馆,书籍不是按到达日期排列,而是通过一个 通用密钥。这标志着从顺序存储到 关联容器的范式转变。与其线性扫描一个 向量 ($O(N)$),我们使用 映射(map) 或 集合(set) 来实现对数时间查找($O(\log n)$)
1. 关联的本质
在 映射(map)中,我们存储 键值对。键充当索引,可以是字符串、自定义对象,或任何支持 严格弱序的类型。而 集合(set)则仅存储唯一键,是进行成员测试或筛选的理想工具。
2. 有序与无序
标准容器(映射(map), 集合(set))会保持键的排序。 C++11 标准 引入了无序变体(unordered_map),它们使用 哈希函数 实现平均 $O(1)$ 的性能。使用这些高性能桶时,请留意 C++11 标志 标志。
main.py
TERMINALbash — 80x24
> Ready. Click "Run" to execute.
>
QUESTION 1
What happens when you use the subscript operator
m[k] on a map if the key k does not exist?It throws an out_of_range exception.
It returns a null pointer.
It adds a new element with key k and a value-initialized value.
The program fails to compile.
✅ Correct!
Subscripting a map behaves differently from a vector; it automatically inserts missing keys.❌ Incorrect
Unlike at(), the [] operator is an insertion-on-demand mechanism.QUESTION 2
Which of the following calls is ILLEGAL for a multiset
c and a vector v?copy(v.begin(), v.end(), inserter(c, c.end()));copy(v.begin(), v.end(), back_inserter(c));copy(c.begin(), c.end(), back_inserter(v));copy(c.begin(), c.end(), inserter(v, v.end()));✅ Correct!
Correct. back_inserter requires push_back, which associative containers like multiset do not support.❌ Incorrect
Associative containers lack push_back, so back_inserter cannot be used with them.QUESTION 3
Identify the correct way to define a variable to hold the result of calling
find on a map<string, vector> >.vector<int> res = m.find("key");map<string, vector>::iterator it = m.find("key"); >pair<string, vector> p = m.find("key"); >bool found = m.find("key");✅ Correct!
find returns an iterator to the element if found, or m.end() otherwise.❌ Incorrect
The find method always returns an iterator, not the value or a boolean.QUESTION 4
What is the primary advantage of using a
set over a vector for storing a list of excluded words?A set uses less memory than a vector.
A set allows for duplicate excluded words.
A set provides O(log N) lookup instead of O(N) linear search.
A set allows elements to be accessed by index.
✅ Correct!
Using exclude.find(word) on a set is significantly more efficient than std::find on a vector for large datasets.❌ Incorrect
While a vector is simpler, its linear search ($O(N)$) becomes a bottleneck compared to a set's $O(\log N)$.QUESTION 5
In the expression
++word_count.insert({word, 0}).first->second;, what does first refer to?The key of the inserted element.
The boolean indicating if insertion was successful.
An iterator to the element with the given key.
The first element in the entire map.
✅ Correct!
map::insert returns a pair where first is an iterator to the element.❌ Incorrect
The return of insert is a pair<iterator, bool>. first is the iterator.Module Assessment: High-Performance Text Processing
Implementing the Text-Query Architecture
You are tasked with building a search engine component (Exercise 12.28). The system must read a file and allow users to query words, returning the line numbers where those words appear. You must decide on the most efficient container strategy without using custom classes initially.
Q
1. Design the data structure to hold the word-to-line-number mapping. Which combination is most efficient?
Solution:
A
A
map<string, set> > is the ideal choice. The map allows $O(\log N)$ lookup for the word (the key), and the set stores line numbers (values) uniquely and in ascending order, facilitating clean result output.Q
2. Write a code snippet to implement the word counting logic that ignores case and punctuation (Required: 100 words).
Solution:
To implement case-insensitive word counting:
To implement case-insensitive word counting:
for (char &c : word) c = tolower(c);
word.erase(remove_if(word.begin(), word.end(), ispunct), word.end());
++word_count[word];
This logic iterates through each character of the string, converting it to lowercase using tolower. Then, it utilizes the remove_if algorithm with ispunct to shift all punctuation marks to the end of the string, followed by erase to truncate them. Finally, it uses the map's subscript operator to increment the count. This ensures that 'Example!', 'example.', and 'Example' are all normalized to 'example' and counted under a single unique key, maintaining data integrity.Q
3. How would you modify the system to safely share this data with a 'QueryResult' object without duplicating the entire map?
Solution:
Use
Use
shared_ptr. Specifically, the QueryResult should hold a shared_ptr<set> >. By using QueryResult(sought, loc->second, file) where the second argument is a shared pointer to the line number set, the data remains valid even if the original TextQuery object's scope ends, while avoiding expensive deep copies.